3 research outputs found
The interaction of sampling ratio and modelling method in prediction of binary target with rare target class
In many practical predictive data mining problems with a binary target, one of the target
classes is rare. In such a situation it is common practice to decrease the ratio of common to
rare class cases in the training set by under-sampling the common class. The relationship
between the ratio of common to rare class cases in the training set and model performance
was investigated empirically on three artificial and three real-world data sets. The results
indicated that a flexible modelling method without regularisation benefits in both mean and
variance of performance from a larger ratio when evaluated on a criterion sensitive to
overfitting, and benefits in mean but not variance of performance when evaluated on a
criterion less sensitive to overfitting. For an inflexible modelling method and a flexible
method with regularisation, the effects of a larger ratio were less consistent. In no
circumstances, however, was a larger ratio found to be detrimental to model performance,
however measured
Bindings as bounded natural functors
We present a general framework for specifying and reasoning about syntax with bindings. Abstract binder types are modeled using a universe of functors on sets, subject to a number of operations that can be used to construct complex binding patterns and binding-aware datatypes, including non-well-founded and infinitely branching types, in a modular fashion. Despite not committing to any syntactic format, the framework is “concrete” enough to provide definitions of the fundamental operators on terms (free variables, alpha-equivalence, and capture-avoiding substitution) and reasoning and definition principles. This work is compatible with classical higher-order logic and has been formalized in the proof assistant Isabelle/HOL